Method and apparatus for watermarking binary computer code

ABSTRACT

A method and apparatus for inserting a watermark into a compiled computer program. A location process specifies an insertion point in the compiled program and a watermark generating process inserts a watermark, based on data to be encoded, into the program at the insertion point. The location process is also utilized to specify the location of watermark data to be decoded.

BACKGROUND OF THE INVENTION

[0001] It can be useful to be able to identify the code produced by different compilers to, among other uses, identify non-licensed uses of the compilers and to track errors. Accordingly, compiler manufacturers require a method of including a serial number or other information in code produced by a compiler. Additionally, a method of analyzing a copy of the compiled code to determine the serial number or other information is also required.

[0002] A private watermark, which is data hidden via steganography, is one method for embedding data in the outputs of licensed programs. However, traditional steganography requires the presence of “low order” bits in the data stream. The low order bits can be changed without the data changing so much that a human can notice the difference. The changed bits, detected when the modified data is compared to the original, can hold the steganographic data. Since traditional stenography changes non-significant low-order bits, steganography is normally applied to digital pictures and sounds which contain non-significant low-order bits.

[0003] Steganography in computer code can't be done with the normal methods because computer code does not contain low-order bits. Every bit in the code is important, and changing even one bit can prevent the code from operating correctly.

[0004] Accordingly, improved techniques for inserting identifying watermarks in compiled programs are needed.

BRIEF SUMMARY OF THE INVENTION

[0005] In one embodiment of the invention, a method for generating and auditing a watermark for a compiled computer program is provided. The watermark is an integral part of the program and does not appear as an external data item.

[0006] In another embodiment, a fixed location in the compiled code is specified and a legal fake instruction that does not affect the operation of the code is inserted. For each binary digit of the data to be embedded, one value of the digit is encoded as a first type of fake instruction and the other value of the binary digit is encoded as a second type of fake instruction.

[0007] In another embodiment of the invention, the data itself is inserted into the compiled code at a location or locations determined by a mathematical function. A computer executing the compiled code also knows the function and determines the location(s) and removes the data prior to executing the compiled code. If a computer that does not know the function executes the program then it will crash because the inserted data are not legal instructions.

[0008] In another embodiment of the invention, the data is encrypted prior to being inserted in the code.

[0009] Other features and advantages of the invention will be apparent in view of the following detailed description and appended drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010]FIG. 1 is a block diagram of a computer system configured to implement an embodiment of the invention;

[0011]FIG. 2 is a block diagram depicting the operation of an embodiment that encodes the watermark by inserting a fake instruction;

[0012]FIG. 3 is a flowchart of the watermark encoding process of an embodiment of the invention;

[0013]FIG. 4 is a flowchart of the watermark decoding process of an embodiment of the invention; and

[0014]FIG. 5 is a block diagram depicting the operation of an embodiment that encodes the watermark by inserting data to be embedded.

DETAILED DESCRIPTION OF THE INVENTION

[0015] The invention will now be described, by way of example not limitation, with reference to various embodiments. FIG. 1 is a block diagram of a computer system 10 configured to implement an embodiment of the invention. The computer system 10 includes a computer 12, an input device 14 such as a keyboard, and an output device 16 such as a display screen. The computer 12 includes a main memory 18, which may include RAM and NVRAM, and a central processing unit (“CPU”) 20. A compiler 24, a source code module 26, a compiled program 30, a watermark generating process S( ) 32, and a location generating process R( ) 34 are stored in the memory 18. As will be described below, a key, for use by the location generating process, may be stored in the main memory 18.

[0016] A first embodiment of the invention will now be described. The compiler and computer processor agree on a function R( ), which is a location determining function 34 that determines one or more insertion points within a given compiled binary code. R( ) may be a constant function or may depend on the binary. In one embodiment, R( ) is a random number generator seeded by some part of the compiled code. Alternatively, R( ) may be a polynomial with inputs communicated by the compiled binary code.

[0017] In an alternative embodiment, a value to be provided to R( ) can be processor specific and stored in the main memory 18 of the computer.

[0018] The operation of the first embodiment will now be described in more detail with reference to FIGS. 2-4. Referring to FIG. 2, a first block 40 depicts the unmodified program instructions, a second block 42 depicts the data to be added, in this example “1010000”, a third block 44 depicts the insertion point generated, in this example “4”, and a fourth block 46 depicts the resulting modified instructions. In this example it is assumed that the lines of program instructions are numbered sequentially.

[0019] Referring now to the flowchart of FIG. 3, R( ) is called and generates a first insertion point as described above. The first bit of the data to be encoded, “1”, is then encoded as a fake move instruction. The compiler resources are utilized to identify unused registers which can be used as the arguments of the fake move instruction. Thus, in this case, the “mov1 % edx,% ebp” instruction does nothing because edx is not used in this function. The presence of the “mov1” instruction in the compiled program (a change from the original program) encodes the first “1” bit of the data to be embedded.

[0020] The process then loops to call R( ) again to generate a second insertion point offset from the first insertion point. The second bit, “0”, of the data is encoded. The encoding of the bits can be implemented in various ways.

[0021] In the currently described embodiment a first fake instruction, e.g., mov1 is utilized to encode “1” and a second fake instruction is utilized to encode “0’.

[0022] The bit “0” could be encoded as “0”, i.e., no instruction, or as another fake instruction that does nothing such as an “add” instruction that adds operands in unused registers. The fake instruction is then inserted at the incremented insertion point. The process continues to loop until all the data bits are encoded into the compiled program.

[0023] The auditing and/or removal of the encoded data will now be described with reference to flowchart of FIG. 4. R( ) is called to locate the insertion point. The fake instruction at the insertion point is decoded by S( ) to generate the first bit, “1”, of the digital encoded data. The deletion of the fake instruction is optional because the fake instruction does not affect the operation of the program. The program then loops to decode, and optionally delete, all the fake instructions encoding the watermark.

[0024] A more detailed description of the second embodiment will now be described with reference to FIG. 5 where the unencoded watermark data is copied into the program code at the insertion point. The steps are the same as described above with reference to FIG. 3 except that the data is not encoded as fake instructions that have no effect

[0025] The removal of the watermark is the same as the steps described above with reference to FIG. 4 except that the data is not decoded and the removal is mandatory. The data is not a legal instruction and thus would cause the program to crash. An added benefit of this embodiment is that unauthorized users would not be able to use the compiled program.

[0026] For either embodiment described above, the data inserted as watermarks can be public or private. Private data can be encrypted or made private in some other way.

[0027] . A lot of data can be stored in the watermark this way. In the first embodiment, it is difficult to find (and thus strip out) the watermark data. In the second embodiment, the program will not execute on a processor which does not know the function R( ), even if it supports the same instruction set.

[0028] The invention has now been described with reference to the preferred embodiments. Alternatives and substitutions will now be apparent to persons of ordinary skill in the art. For example, the types of fake instruction which can encode the digital data to be encoded are not limited to the examples described. Additionally, groups of bits or characters could be encoded and inserted at a single insertion point. Accordingly, it is not intended to limit the invention except as provided by the appended claims. 

What is claimed is:
 1. In a computer system, a method for encoding data as a watermark in an executable copy of a computer program generated by a compiler, with the data including a plurality of binary digits, the method comprising: executing an instruction number generating process to generate an instruction number specifying an insertion point in the computer program; and executing an encoding function that, for each binary digit in the data to be embedded, inserts the data bit, or a bit derived or dependent on the data bit, into the computer program at the insertion point.
 2. In a computer system, a method for encoding data, the data including a plurality of binary digits, in an executable copy of a computer program generated by a compiler, the method comprising: executing an instruction number generating process to generate an instruction number specifying an insertion point in the computer program; and executing an encoding function that, for each binary digit in the data, encodes the data as a fake executable instruction, that, when executed, has no effect on the running of the program, and inserts the fake instruction generated at the insertion point.
 3. The method of claim 2 further comprising the acts of: executing a decoding function for decoding the fake executable instruction, inserted at the insertion point, into a binary digit; and outputting the binary digit as part of the decoded data.
 4. In a computer system, a method for encoding data as a watermark in an executable copy of a computer program generated by a compiler and for decoding the watermark to recover the data, with the data including a plurality of digital characters, the method comprising: for each character to be encoded: calling a location determining process to generate an insertion point in the executable copy of the computer program; inserting a representation of a character to be encoded at the insertion point; for each representation to be decoded: calling the location determining process to generate the location in the copy of the executable computer program of the representation to be decoded; decoding the representation to recover the character that was encoded.
 5. The method of claim 4 where the step of inserting a representation further comprises the act of: encoding the character to be encoded as a fake instruction that has no effect on the operation of a compiled program.
 6. The method of claim 4 further comprising the step of: deleting the representation to be decoded from the executable copy of the computer program.
 7. A computer program product including computer readable program code for causing a computer to encode data as a watermark in an executable copy of a computer program generated by a compiler and to decode the watermark to recover the data, with the data including a plurality of digital characters, said computer program product comprising: a computer readable medium having computer readable program code embodied therein, with said computer readable program code further comprising: computer readable encoding program code for causing a computer to encode a character by generating an insertion point in the executable copy of the computer program and inserting a representation of a character to be encoded at the insertion point; and computer readable decoding program code for causing a computer to decode a representation of character encoded into the executable copy of the computer program by generating the location in the copy of the executable computer program of the representation to be decoded and decoding the representation to recover the character that was encoded.
 8. A computer program product including computer readable program code for causing a computer to encode data as a watermark in an executable copy of a computer program generated by a compiler, with the data including a plurality of digital characters, said computer program product comprising: a computer readable medium having computer readable program code embodied therein, with said computer readable program code further comprising: computer readable encoding program code for causing a computer to encode a character by generating an insertion point in the executable copy of the computer program and inserting a representation of a character to be encoded at the insertion point.
 9. A computer program product including computer readable program code for causing a computer to decode a watermark to recover data encoded as the watermark in an executable copy of a computer program generated by a compiler, with the data including a plurality of digital characters, said computer program product comprising: a computer readable medium having computer readable program code embodied therein, with said computer readable program code further comprising: computer readable decoding program code for causing a computer to decode a representation of a character encoded into the executable copy of the computer program by generating the location in the copy of the executable computer program of the representation to be decoded and decoding the representation to recover the character that was encoded. 